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AESTRACT 


A  regression  mode),  is  used  by  the  Office  of  the  Secretary 
of  Defense  (CSD)  to  predict  median  rents  so  as  to  find  variable 
housing  allowance  (VHA)  as  a  supplement  to  Basic  Allowance  for 
Quarters  (BAQ) .  These  allowances  are  made  for  service  members 
in  the  continental  United  States.  It  is  this  model  that  is 


reviewed  in  this  thesis.  Median  rental  data  taken  from  the 
annual  VHA  survey  are  used  to  test  this  model  .  From  this 
analysis,  the  model  indicates  lack  of  fit,  invalid  assumptions 
and  perhaps  not  even  a  reasonabl e,y  approach .  A  more  sensible 
approach  is  used  to  propose  two  other  regression  models. 

These  models  are  a  Weighted  Regression  Model  which,  like 
the  current  model,  predicts  medians;  and  an  Analysis  of 
Covariance  model  which  predicts  or  analyzes  the  mean  rent. 
More  reasonable  predictions  of  median  and  mean  rent  are 
indicated  by  these  two  models  respectively. 
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this  research  may  not  have  been  exercised  for  all  cases  of 
interest.  While  every  effort  has  been  made,  within  the  time 
available,  to  ensure  that  the  programs  are  free  of  computa¬ 
tional  and  logic  errors,  they  cannot  be  considered  validated. 
Any  application  of  these  programs  without  additional  verifica¬ 
tion  is  at  the  risk  of  the  user. 
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INTRODUCTION 


I  . 


A .  BACKGROUND 

VHA,  Variable  Housing  Allowance,  is  a  supplement  to  the 
BAQ,  Basic  Allowance  for  Quarters,  paid  to  service  members  who 
live  in  private  housing  in  the  United  States.  VHA  is  designed 
to  aid  the  service  member  who  is  assigned  to  a  "high  cost  area" 
of  the  United  States  where  the  median  monthly  cost  of  housing 
for  a  person  in  the  same  grade  or  dependency  status  exceeds  80% 
of  the  national  median  for  members  in  the  same  rank  or 
dependency  status  [Ref.  l:p.  2-1].  VHA  is  computed  from  the 
following  equation  [Ref.  l:p.  2-2]: 

VHA  =  local  median  housing  costs  -  80  %  of  the  national  (1) 

by  paygrade  and  marital  median  housing  cost 

status  by  paygrade  and 

marital  status. 

The  law  specifies  that  each  member's  VHA  allowance  will  be 
determined  by  the  actual  housing  costs  currently  paid  by  the 
service  member  [Ref.  l:p.  2-2].  VHA  rates  are  computed  by  the 
Per  Diem  Travel  and  Transportation  Allowance  Committee  Staff, 
a  subset  of  the  Office  of  the  Secretary  of  Defense  (OSD) ,  with 
the  aid  of  the  Defense  Manpower  Data  Center,  DMDC.  The  basic 
process  by  which  the  rates  are  computed  is  as  follows: 

1.  Distinct  areas  in  which  military  members  reside  are 
determined . 

2.  Proper  sample  sizes  are  determined. 

3.  Survey  samples  of  housing  costs  are  taken,  edited  and 
median  rents  are  computed  for  each  category  of  paygrade, 
house  type,  number  of  bedrooms,  and  marital  status. 


4.  Preliminary  VHA  rates  for  each  area  and  dependency  status 
are  computed  by  determining  an  estimated  median  rent  for 
each  category  using  the  GPX  program  which  utilizes 
various  regression  analysis  techniques  and  smoothing 
procedures.  (GPX  is  the  name  of  the  model  developed  by 
OSD.  ) 

5.  Preliminary  VHA  rates  are  reviewed  to  ensure  that  the 
rates  determined  by  GPX  are  in  line  with  the  cost 
guidelines  set  by  Congress. 

B.  CURRENT  VHA  COMPUTATIONAL  PROCESS 

The  computation  of  preliminary  VHA  rates  for  each  area 
(MHA  -  military  housing  area),  paygrade,  and  dependency  status 
has  developed  into  an  extremely  complicated  process.  Once  the 
median  rents  are  computed  for  each  category  of  house  type, 
number  of  bedrooms,  paygrade,  and  marital  status,  a  count  of 
the  number  of  median  rents  per  category  is  taken  [Ref.  l:p.  2- 
56]  .  If  the  number  of  counts  in  each  category  for  a  particular 
MHA  is  too  small  then  larger  sample  sizes  are  obtained  by 
incorporating  median  rent  information  from  the  same  category 
from  a  close,  in  geographic  terms,  MHA.  [Ref.  l:p.  2-58]  This 
information,  taken  from  these  close  MHA's  is  then  weighted. 
The  closer,  in  terms  of  miles,  this  MHA  is  to  the  original  MHA 
the  more  weight  is  placed  on  the  information  from  that  MHA. 
[Ref.  l:p.  2-59]  A  new  vector  of  median  rents,  incorporating 
the  information  from  the  geographically  close  MHAs  and 
dimensioned  by  the  four  categories  above  is  calculated.  [Ref. 
l:p.  2-59]  The  underlying  reason  for  finding  this  vector  of 
median  rents  is  to  find  the  underlying  relationship  between 
the  total  pay  of  a  military  member  and  the  amount  of  rent  a 
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the  total 


military  member  will  pay  [Ret.  l:p.  2-60].  Let  = 

pay  for  a  person  in  the  ith  paygrade,  in  the  jth  dependency 
status  who  has  '  k'  number  of  bedrooms  in  his  or  her  home  and 
an  ’1'  type  of  home.  Let  T-^j  equal  the  median  rent  for 
military  members  in  that  same  group.  Then  the  current 
regression  model  in  use  is: 

1  =  +  B  +  eijH  <2> 

Ti)t:  pi*i 

where  is  the  error  term.  Standard  linear  Regression 

techniques  are  use  to  est .mate  A  and  B  which  assume  the  error 
is  normally  distributed,  homoscedastic ,  and  with  mean  zero. 
This  in  turn  means  that  the  distribution  of  inverted  median 
rent  is  normal  and  homoscedastic.  It  is  not  clear  that  these 
assumptions  are  in  any  sense  "reasonable".  In  fact  if  medians 
tend  to  be  normal,  then  the  inverse  will  certainly  not  be 
normal .  Let  A  and  B  denote  the  regression  estimates  of  A  and 
B,  respectively.  The  estimates  A  and  Bare  used  to  determine 

the  estimated  median  rents,  R^j  through  the  equation 


where  R^j  and  denote  the  rent  and  total  pay,  respectively, 
for  paygrade,  marital  status,  number  of  bedrooms  and  house  type 
[Ref.  l:p.  2-60].  Generally,  a  separate  A  and  B  are  determined 
for  the  enlisted,  company  grade  officers,  and  field  grade 
officer  ranks.  Thus  a  separate  R^  is  computed  for  each  one 
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is  then  used 


of  these  three  ranks  of  military  personnel.  R^j 
to  determine  owner  equivalency  median  rents.  Owner  equivalency 
rents  are  the  rent  fig  es  assigned  to  a  military  member  who 
owns  and  does  not  rent  his  or  her  residence.  Costs  assigned 
to  owners  are  thought  not  to  be  appropriate  for  use  in 
calculatir  VHA  since  intangible  benefits  accrue  to  owners  and 
not  to  renters.  These  owner  equivalency  median  rents  are 
weighted  according  to  population  percentage  of  owners  and  are 
then  incorporated  into  the  vector  of  median  rents  [Ref.  l:p. 
2-61].  This  new  vector  of  median  rents,  including  both  owner 
and  renter  information,  still  has  four  dimensions  and  must  then 
be  aggregated  to  the  paygrade  and  dependency  status  level. 
[Ref.  l:p.  2-61]  After  this  aggregation,  a  further  smoothing 
process  and  a  denormalization  process,  the  VHA  rate  multipliers 
are  finally  computed  by  dividing  by  a  weighted  average  of  BAQ 
rates  [Ref.  l:p.  2-63].  These  multipliers  are  checked  and  if 
an  inversion  exists,  which  for  example,  is  when  paygrade  02 
receives  less  VHA  than  paygrade  01,  then  additional  smoothing 
across  paygrades  will  take  place.  If  inversions  still  exist 
after  the  smoothing  process  has  taken  place  then  the  entire 
computation  of  VHA  multiplier  rates  begins  again  from  the  point 
where  data  from  close,  in  geographic  terms,  MHAs  is  used  [Ref. 
l:p.  2-64].  Median  rent  information  is  then  taken  from  these 
MHA's  and  the  entire  process  is  run  again  and  again,  up  to  11 
more  times  until  the  rate  inversions  cease  to  exist.  If  after 
11  more  times  an  inversion  still  exists  then  the  GPX  program 
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aborts  and  an  inversion  in  the  total  population  data  is 
assumed.  [Ref.  l:p.  2-64] 


C.  PROPOSED  PLAN  TO  UPDATE  VHA  COMPDTAT I ONAL  PROCESS 

In  an  effort  to  get  away  from  the  geographical  weighting 


of  data 

from 

close  proximity  MHA 

'  s  and 

in 

an  attempt 

to 

simpl i f y 

the 

process  of  computing 

VHA  rates. 

the  Per  Diem 

Committee 

is 

investigating  a  new 

method 

for 

computing 

VHA 

rates.  Under  this  "new"  plan,  survey  data  from  each  MHA  is 
placed  into  various  costing  bands  based  on  county  rental  data 
from  HUD  (Department  of  Housing  and  Urban  Development)  in  the 
following  manner.  From  each  county  in  the  United  States,  HUD 
has  data  for  the  average  rental  costs  in  that  county.  A 
military  housing  area  is  placed  into  a  costing  band  with  other 
military  housing  areas  which  have  the  same  average  rental 
costs.  Therefore  if  the  computed  average  rental  cost  for  MHA 
A  is  $260.00  and  the  median  rental  cost  for  MHA  B  is  also 
$260.00,  MHA  A  and  MHA  B  would  be  placed  in  the  same  costing 
band.  The  computed  median  rent  figure  used  in  this  "new" 
process  is  a  single  figure  found  by  taking  a  weighted  average 
of  rental  costs,  based  on  number  of  bedrooms  and  house  type, 
from  the  national  military  population.  For  example,  if  10%  of 
the  national  military  population  resides  in  one  bedroom 
apartments,  the  average  rental  cost  of  one  bedroom  apartments 
for  that  MHA  accounts  for  10%  of  the  total  average  rental  cost 
figure  for  that  county.  Initially  the  bands  will  be  broken 
into  groups  of  $45.00  increments.  The  costing  bands  begin  at 
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an  average  rental  cost  of  $260.00  and  continue  up  to  $890.00. 
There  is  one  further  costing  band  which  accounts  for  the 
extremely  high  average  rental  cost  areas  such  as  Alaska  which 
are  so  far  above  all  of  the  other  areas  in  terms  of  cost.  Thus 
there  are  a  total  of  15  different  costing  bands  including  the 
"high"  costing  band.  The  idea  behind  grouping  military  housing 
areas  together  which  have  similar  average  rental  costs  is  to 
provide  more  data  points  to  reliably  predict  median  rental 
costs  per  paygrade  and  dependency  status  based  on  the  survey 
data.  Also  using  an  "outside",  other  than  mi  1 itary ,  source  to 
group  the  data  provides  a  small  means  of  getting  away  from  the 
military  raising  its  own  VHA  rates.  The  "intent  of  VHA  is  not 
to  reimburse  the  military  member  for  what  he  or  she  pays  for 
housing  costs  but  to  enable  the  military  person  to  live  in 
adequate  housing  in  whichever  area  he  or  she  is  assigned"1. 

The  costing  bands  will  be  used  for  two  major  purposes.  One 
purpose  is,  through  the  use  of  an  appropriate  regression  model, 
to  determine  owner  equivalency  housing  costs,  and  the  other 
purpose  is  to  provide  housing  cost  data  when  there  is 
insufficient  data  in  a  category  to  determine  a  median  rent  for 
that  category.  Once  this  needed  data  is  found  it  will  be 
incorporated  back  into  the  MHA  data,  and  then,  within  the  MHA, 
a  median  rent  figure  will  be  computed  for  each  paygrade  and 
dependency  status.  This  figure  will  then  be  utilized  in  the 
congressional  1 y  mandated  equation,  (1),  local  median  rent  -  80% 

\  From  a  conversation  with  Debra  Davis,  DMDC . ,  June  1989. 
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of  national  median  rental  cost,  to  determine  the  VHA  rates  for 
that  MHA.  Of  course  these  VHA  rates  are  then  subject  to 
budgetary  constraints  and  congressional  approval. 

D.  DATA  DESCRIPTION 

The  data  used  to  determine  VHA  rates  come  from  data 
collected  from  military  members  who  participate  in  the  VHA 
Survey.  The  VHA  Survey  is  taken  every  other  year.  The  data 
collected  from  the  survey  are  kept  by  the  Defence  Manpower  Data 
Center  which  is  the  repository  for  all  of  the  data  used  in  the 
VHA  calculations.  The  data  used  in  the  VHA  process  consist  of 
raw  survey  data  taken  from  each  military  housing  area,  and 
contain  information  such  as  what  type  of  house  a  military 
member  lives  in,  whether  it  is  a  single  family  home,  townhouse, 
apartment,  or  mobile  home,  how  many  bedrooms  the  house 
contains,  whether  or  not  the  military  member  has  any  dependents 
or  whether  he  or  she  shares  the  housing  costs  with  another 
military  member,  and  the  paygrade  and  service  of  the  military 
member.  Also  contained  in  the  data  for  each  military  person 
who  participates  in  the  survey  is  the  rental  cost,  utility 
costs,  and  maintenance  cost  of  the  housing.  Other  items  such 
as  social  security  numbers,  whether  the  member  rents  or  owns 
the  housing,  and  other  miscellaneous  information  are  also  part 
of  each  data  record  for  that  particular  military  person. 

The  data  used  in  this  analysis  and  taken  from  the  1989 
survey,  consist  of  the  paygrade  (El-09)  and  dependency  status, 
having  dependents,  single,  or  single  and  sharing,  of  the 
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military  member.  In  addition,  the  total  housing  cost  for  that 
member  which  consists  of  the  rent  plus  the  maintenance  cost 
plus  the  utility  and  insurance  costs  is  used.  Further 
information  on  the  living  space  for  the  individual  is  also 
needed,  such  as  the  number  of  bedrooms  (1-4),  and  the  type  of 
living  space,  detached  house,  townhouse  type,  apartment,  and 
or  mobile  home.  Additionally,  total  pay  (basic  pay  +  BAQ)  has 
to  be  associated  with  each  military  member's  dependency  status 
and  paygrade  in  order  to  perform  the  regression  analysis. 
These  raw  data  are  edited  to  reflect  only  true  rental  costs  not 
ownership  costs.  Thus  one  data  record  used  in  this  analysis 
consists  of  information  regarding  paygrade,  house  type,  number 
of  bedrooms,  dependency  status,  total  housing  costs,  and  total 
pay. 

From  this  initial  set  of  data  one  median  rent  for  each 
category  of  house  type,  number  of  bedrooms,  marital  status, 
and  paygrade  is  then  computed.  Thus  data  for  an  individual 
costing  band  which  might  have  consisted  of  over  50,000  records 
is  reduced  to  a  data  set  which  contains  a  maximum  of  1104 
records  which  reflects  all  of  the  possible  combinations  of 
paygrade,  house  type,  number  of  bedrooms  and  dependency  status. 

SAS  was  used  to  extract  and  edit  the  raw  data,  match  total 
pay  to  paygrade  and  dependency  status,  and  compute  a  median 
rent  figure  for  each  category  of  paygrade,  dependency  status, 
number  of  bedrooms,  and  house  type.  (An  example  of  this 
program  can  be  found  in  Appendix  B.) 
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E.  PROBLEMS  WITH  THE  DATA 


There  is  one  major  problem  associated  with  the  data  used 
in  the  VHA  computational  process.  The  data  used  does  not 
include  data  from  the  military  members  who  are  in  paygrades  E5 
and  above  and  who  share  a  residence  with  another  person.  These 
data,  which  might  provide  further  information  and  might  enable 
a  more  reliable  estimate  of  median  rents  for  a  MHA,  to  be 
computed,  are  not  being  used.  This  is  a  policy  decision.  This 
is  a  major  problem  in  the  computation  of  VHA  rates  because  one 
of  the  basic  reasons  for  the  existence  of  the  "costing  band" 
idea  and  one  of  the  major  problems  associated  with  the  current 
manner  in  which  VHA  rates  are  calculated,  is  the  sparsity  of 
data.  This  policy  decision  essentially  throws  away  what  could 
be  valuable  and  informative  data  and  is  contradictory  to  the 
purpose  of  finding  "good"  estimates  of  median  rents. 

F.  PURPOSE  OF  THESIS 

The  main  purpose  of  this  thesis  will  be  to  test  the 
validity  of  the  currently  used  regression  model  equation  (2). 
The  data  in  its  newly  proposed  format  of  costing  bands  will  be 
used.  If  the  current  regression  model  is  not  found  to  be 
adequate  then  the  second  goal  of  this  thesis  is  to  suggest  a 
better,  more  sensible  model  which  will  more  accurately  predict 
total  housing  costs  for  each  costing  band.  Thus  this  thesis 
will  basically  consist  of  two  different  types  of  analyses  and 
will  analyze  the  MHA  data  from  two  vantage  points.  Since  there 
is  no  explanation  as  to  why  an  inverse  of  rent  is  predicted 
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linearly  by  the  inverse  of  pay  (equation  2)  a  more  sensible 
regression  model  will  be  examined  to  explain  the  relationship 
between  total  rent  and  total  pay. 

A  secondary  goal  of  this  thesis  will  be  to  test  the  current 
and  any  proposed  regression  models  not  only  with  the  data  that 
is  currently  assigned  to  each  costing  band  but  also  with 
fifteen  other  costing  bands  comprising  of  data  from  the 
original  costing  band  plus  data  from  the  military  members  who 
are  E5  and  above  who  share  housing  with  another  person.  Thus 
thirty  costing  bands  will  be  formed  and  a  comparison  of  the 
regression  models  using  the  data  from  the  original  costing 
bands  and  data  from  the  "new"  costing  bands  will  be  made.  This 
is  important  because  it  may  show  that  the  regression  models  are 
better  able  to  predict  housing  costs  with  the  added  information 
and  this  in  turn  will  provide  better,  more  accurate  VHA  rates. 


10 


II.  ANALYSIS  PROCEDURES 


A.  ORDINARY  LEAST  SQUARES  REGRESSION 

Most  of  the  analysis  performed  in  this  thesis  employs 
simple  linear  regression  (ordinary  least  squares)  to  test  the 
various  postulated  models. 

In  ordinary  least  squares  regression,  a  linear  model, 

Y;  =  BQ  +  B.X:  +  e,  (4) 

is  used  to  find  the  relationship  between  the  X; '  s  (independent 
variables)  and  the  Y;.s  (dependent  variables).  The  random  error 
component  is  denoted  by  e;  and  assumed  to  be  normally 
distributed  independent  random  variables  with  mean  zero  and 
constant  variance,  a1.  This  relationship  as  described  by  B. 
and  B:  is  used  to  further  predict  or  estimate  other  Y;'s.  Since 

B.  and  B<  are  fixed  and  unknown,  b.  and  b< ,  are  used  to  denote 

the  estimates  of  their  values  [Ref.  2:p.  11].  With  the 

utilization  of  these  estimators  the  least  squares  regression 
fitted  values  are  described  by  [Ref.  2:p.  11], 

Y  =  bQ+  b-X..  (5) 

The  values  for  bQ  and  bj  are  determined  by  minimizing 

n  ~  •)  n  i 

S  =  Z  e/  =  Z  ( Y,  -  B  -  BiX; ) 1 .  (6) 

i=l  •  i=l  *  5 

By  differentiating  this  equation  first  with  respect  to  B3  and 

then  with  respect  to  B^ ,  and  then  by  setting  these  results 

* 

equal  to  zero  and  solving  for  B0  and  Bi ,  the  values  for  b,  and 
b-  are  found  by  setting  the  solution  for  B,  equal  to  b3  and  B. 


equal  to  b^.  [Ref.  2:p.  13]  The  rationale  behind  this 

minimization  process  is  to  ensure  that  the  predicted  ith  value 
is  as  "close"  as  possible  (in  Euclidean  vertical  distance)  to 
the  actual  ith  value  for  all  i.  If  the  model  (4)  is  correct 
these  estimates  have  minimum  variance  among  all  unbiased 
estimates.  [Ref.  2:p.l4]  Utilizing  the  method  above,  the 
value  for  bQ  [Ref.  2:p.  14]  is 
given  by 


bo  = 


b1  X 


(7) 


and  the  value  for  b,  [Ref.  2:p.  13]  is  given  by 


n 


bl  = 


.2,  (X:  -  X)(Y:  -  Y) 

n  _  i 

.2,  (X:  -  X)2. 

1  =  1 


(8) 


The  sum  of  the  residuals  squared  divided  by  the  number  of 
observations,  n,  minus  two  is  given  by 

n  -  “  ,2 


ill  (Y>  -  V 


(9) 


(n-2) 

and  represents  the  unbiased  estimator  of  the  variance  about 
the  regression  oyI  [Ref.  2:p.  21]  if  the  model  is  correct.  If 
a  postulated  model  (i.e.,  the  conditional  variance  of  y  given 


x)  is  the  true  model  then  o  =  °yX*  [Ref.  2:p.  23]  Thus  s  is 
an  estimate  of  o  if  the  model  is  correct.  [Ref.  2:p.  23] 


The  basic  assumptions  of  ordinary  least  squares  regression 


are : 

1.  E(ei)  =  0,  V(ei)  =o2. 

2.  e.  and  e,  are  uncorrelated,  Cov(e;,  e4)  =  0. 

1  J  *  J 


4 
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3.  is  a  normally  distributed  random  variable  with  mean 
zero  and  variance  o  .  Thus  the  e^'s  are  independent. 

4.  E(Y|x)  =  a  +  bX,  the  conditional  expectation  of  Y  given 
X  is  linear  in  X. 

If  assumptions  1  and  2  hold  then  ordinary  least  squares 
provides  the  best  minimum  variance  linear  unbiased  estimates 
of  the  Bfl  and  Bj.  [Ref.  2:p.  87]  If  all  of  the  above 

assumptions  hold  then  b0  and  b^  are  the  maximum  likelihood 
estimates  of  BQ  and  B,  and  s  is  an  unbiased  estimate  of  a  . 
[Ref.  2 : p .  88] 

If  the  residuals  are  normally  distributed  it  is  then 
possible  to  use  the  F  and  t  tests  to  test  the  significance  of 
the  regression  and  to  test  the  individual  null  hypotheses  that 
B„  equals  0  or  that  B,  equals  0.  If  the  null  hypothesis  is  not 
rejected  and  the  values  for  B0  and  B^  are  not  deemed  different 
from  zero  then,  of  course,  there  is  no  significant  linear 
relationship  between  the  independent  variables  and  the 
dependent  variables.  The  t  test  statistic  is 

(b--0)  {I  (xrx)2}' 

t  =  1 11  _  (10) 

s 

and  has  a  student's  t  distribution  with  n-2  degrees  of  freedom. 
[Ref.  2:p.  26]  The  F  test  statistic  tests  the  overall 

significance  of  the  regression.  The  F  test  statistic  is 

F  =  bl  <Xi  -  X><Yi  -  Y>>  (H) 


13 


and  has  1  and  n-2  degrees  of  freedom.  [Ref.  2:p.  32] 

The  R  value  measures  the  "proportion  of  total  variation 

about  the  mean  Y  explained  by  the  regression".  [Ref.  2:p.  33] 
*) 

R  is  the  sum  of  squares  due  to  regression  divided  by  the  total 
sum  of  squares,  corrected  for  the  mean  Y  and  is  denoted  by 


R3  = 


n  ~  , 

E  ( Y;  -  Y)z 

i  =  l 


(12) 


n 


2  Of;  -  y)2. 
1  =  1  * 


,2 


Values  for  R  fall  between  0  and  1.  The  closer  the  value  of 
•) 

R‘  is  to  1  the  better  the  regression  equation  explains  the 
variation  of  the  data  about  Y. 

The  "residuals  contain  all  available  information  on  the  way 
in  which  the  fitted  model  fails  to  properly  explain  the 
observed  variation  in  the  dependent  variable  Y"  [Ref.  2:p.  34]. 
Thus  careful  examination  of  the  residuals  will  provide 
indications  as  to  the  adequacy  of  the  proposed  model .  A 
graphic  examination  of  the  residuals  may  provide  an  indication 
that  the  model  is  systematically  deficient.  Also  utilizing  a 
lack  of  fit  test  may  indicate  that  the  model  appears  to  be 
inadequate . 

The  lack  of  fit  test  breaks  the  residual  sum  of  squares 
into  the  mean  square  due  to  lack  of  fit,  MS^,  and  the  mean 
square  due  to  pure  error,  s.  .  [Ref.  2:p.  37]  The  MS, 

V  4. 

7  7 

estimates  o  if  the  model  is  correct  and  o  plus  a  bias  term  if 


the  model  is  inadequate.  The  value  for  sj  estimates  o  .  [Ref. 
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2:p.  37]  The  lack  of  fit  test  compares  the  F  ratio  MS,/sJ  with 
the  100(l-a)%  point  of  an  F  distribution  with  (nr -  ne)  and  ne 
degrees  of  freedom  where  nr  equals  the  number  of  degrees  of 
freedom  associated  with  the  residual  sum  of  squares  and  ne 
equals  the  number  of  degrees  of  freedom  associated  with  the 
pure  error  sum  of  squares.  If  the  comparison  is  significant 
(i.e.,  the  F  ratio  is  greater  than  the  tabled  F  value)  this 
then  serves  as  an  indication  that  the  model  is  inadequate  [Ref. 
2:p.  37].  If  the  test  is  not  significant  (i.e.,  the  F  ratio 
value  is  less  than  the  tabled  F  value),  this  is  an  indication 
that  "there  appears  to  be  no  reason  to  doubt  the  adequacy  of 
the  model  and  both  pure  error  and  lack  of  fit  mean  squares  can 
be  used  as  estimates  of  o'.  [Ref.  2:p.  37] 

By  graphically  examining  the  residuals,  a  scatter  plot  of 
the  e;'s  versus  the  Y.'s  will  give  an  indication  as  to  whether 
or  not  the  assumptions  of  normality,  homoscedasticity  and 
linearity  of  ordinary  least  squares  have  been  violated.  If  the 
proposed  model  is  correct,  the  resulting  residuals  should 
indicate  that  these  assumptions  hold.  [Ref.  2:p.  141]  If  the 
model  is  correct  a  plot  of  the  residuals  versus  the  fitted 
values  should  take  the  shape  of  a  horizontal  band  as  shown  in 
Figure  2.1  below  [Ref.  2:p.  145],  If  the  plot  of  the  residuals 
takes  the  shape  of  a  funnel  as  shown  in  Figure  2.2  below  [Ref. 
2 : p .  146],  the  variance,  o  ,  is  not  constant  and  is  increasing 
with  x,  which  indicates  the  need  either  for  weighted  least 


squares  or  a  transformation  on  the  observations  before 
performing  a  regression  analysis.  [Ref.  2:p.  147] 


y 


x 

Figure  2.1  Satisfactory  Residual  Plot 
[Ref.  2 : p .  145] 


y 


x 

Figure  2.2  Unsatisfactory  Funnel-Shaped  Residual  Plot 

[Ref.  2 : p .  146] 

B.  INITIAL  MODELS  TESTED  USING  ORDINARY  LEAST  SQUARES 
REGRESSION 

The  first  step  in  this  analysis  was  to  test  the  model 
currently  in  use,  equation  (2),  to  see  if  it  could  be  used  to 
predict  median  rents  for  each  of  the  thirty  costing  bands. 
The  model  was  tested  under  several  different  conditions. 
First,  the  model  was  run  using  all  of  the  available  data  in 
each  costing  band.  Next  the  data  was  divided  by  marital  status 


16 


and  within  each  costing  band  the  model  was  tested  using  all  of 
the  data  for  those  military  personnel  with  dependents  and  then 
the  model  was  tested  using  all  of  the  data  for  those  military 
personnel  without  dependents.  The  model  was  tested  under 
another  condition  in  which  the  data  was  broken  down  further  by 
paygrades  into  enlisted,  paygrades  1-9,  company  grade  officers, 
paygrades  10-19,  and  field  grade  officers,  paygrades  20-23. 
Thus  the  model  was  tested  within  each  costing  band  according 
to  groupings  of  the  data  consisting  of  enlisted  personnel, 
company  grade  officers,  and  field  grade  officer?  Finally  the 
current  model  was  tested  within  each  costing  band  by  grouping 
the  data  by  a  combination  of  dependency  status  and  paygrade 
categories.  In  this  case  the  data  in  each  costing  band  was 
firs*-  broken  into  groups  by  dependency  status  and  within  each 
dependency  group,  the  data  was  further  broken  into  categories 
of  enlisted,  company  grade  officer  and  field  grade  officer. 

For  each  of  the  above  mentioned  conditions  in  which  the 
model  was  tested,  the  data  was  plotted  1/T ;:y  versus  1/Pilfl  ,  the 
model  was  tested  using  Ordinary  Least  Squares  regression 
procedures,  the  residuals  were  plotted  versus  the  fitted  values 
of  the  median  rents,  T;^i  and  the  residuals  were  tested  for 
normality.  (These  results  are  given  in  the  next  chapter.) 

After  reviewing  the  results  of  the  regression  procedures, 
the  initial  model  did  not  seem  to  adequately  describe  the 
relationship  between  total  pay  and  median  rental  costs  nor  did 
it  serve  as  an  adequate  predictor  of  fitted  values  for  median 
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rental  costs  since  the  assumptions  of  least  squares  regression 
were  violated.  Evidence  of  this,  includes  low  R  values,  non¬ 
normality  of  the  residuals,  unequal  variance  of  the  data,  and 
an  indication  of  significant  lack  of  fit.  This,  along  with 
cross-validation  results  are  explained  in  detail  in  the 
analysis  portion  of  this  thesis.  Therefore  a  new  model  was 
postulated.  The  new  model  was 

Tijll  =  Pijl!*  +  E  +  £  <“> 

in  which  all  of  the  variables  have  the  same  meaning  as  in  the 

previous  model.  The  only  difference  was  that  the  total  pay  and 
median  rental  cost  vectors  were  not  inverted.  This  model  was 
tested  in  all  of  the  same  conditions  as  the  initial  model.  In 
other  words  the  model  was  first  tested  using  all  of  the  data. 
The  data  was  then  broken  into  groups  by  dependency  status  and 
the  regression  was  run  again.  The  data  was  next  broken  into 
groups  by  paygrade  and  ordinary  least  squares  regression  was 
used  to  test  the  model  using  this  data.  Finally  the  data  was 
broken  into  groups  by  a  combination  of  both  by  paygrade  and  by 
dependency  status  and  the  model  was  again  tested. 

The  results  of  the  regression  analysis  testing  this  model 
again  indicated  that  a  systematic  deficiency  in  the  model 
existed;  namely  that  the  residuals  exhibited  a  tendency  towards 
nonconstant  variance  and  that  the  residuals  were  not  normally 
distributed.  The  nonconstant  variance  is  explainable  by  the 
fact  that  different  medians  from  different  population  sizes 
will  have  different  variances.  Thus  a  weighted  least  squares 
approach  was  attempted  in  conjunction  with  this  model . 


18 


C.  WEIGHTED  LEAST  SQUARES  REGRESSION 

If  a  postulated  model  has  been  tested  using  ordinary  least 
squares  procedures  and  examination  of  the  residuals  shows  a 
nonconstant  variance,  a  need  for  some  type  of  transformation 
on  Y  is  necessary.  This  transformation  will  change  the  e's 
so  that  the  assumptions  of  ordinary  least  squares  regression 
will  hold.  [Ref.  2:p.  147]  Generally  a  nonconstant  variance 
among  the  residuals  indicates  that  some  of  the  observations  are 
"less  reliable"  than  others.  '.Ref.  2:p.  108]  In  this  case  the 
e;'s  are  normally  distributed  with  mean  0  and  variance 
instead  of  a  .  Thus  the  e^'s  have  variance  of  v^o  .  To  combat 
this  nonconstant  variance  term,  v-o  ,  the  entire  regression 
equation 


Y,  =  b0  +  b,X;  +  ei  (14) 

is  multiplied  by  the  weight,  .  Thus  the  regression 

equation  becomes 


Yi  =  bo  +  blXi  +  ei 


(15) 


v^I  Vvl  v^T. 

Then  E(e-/y^T)=  0  and  the  V(e,/v/v[)  =  E(e^/vp  =  v.oVv-  =  o‘. 
Thus  e^/y/v^  ~  N(0,oJ).  Therefore  the  assumptions  of  ordinary 
least  squares  will  now  hold  and  ordinary  least  squares 
procedures  may  now  be  applied  to  the  transformed  regression 
equation . 

Evidence  of  nonconstant  variance  was  seen  in  the  residual 
plots  after  OLS  regression  was  applied  using  the  model  (13) 
for  most  of  the  costing  bands.  This  implies,  as  stated  above, 
that  some  of  the  observations  were  less  reliable  than  others. 
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Intuitively  this  makes  sense  in  this  problem  since  each 
observation  represents  a  median  cost  and  not  an  individual 
cost.  Thus  some  observations  represent  the  median  of  20  or  30 
data  points  while  other  observations  represent  the  median  of 
only  5  data  points.  This  makes  the  median  of  only  five  data 
points  "less  reliable"  than  the  median  of  a  data  point  which 
represents  20  or  30  data  points. 

In  order  to  transform  the  model  into  one  in  which  the 
assumptions  of  ordinary  least  squares  holds  a  weight  v^^  must 
be  found.  In  this  case  the  necessary  weight  is  1/s^  where 


1.25  R: 


(16) 


1.35  •yrT^  . 

This  is  the  Gaussian-based  approximation  (Kendall  and  Stuart, 
1967)  of  the  standard  deviation  of  the  median.  [Ref.  3:p.  16] 
R'  equals  the  interquartile  range  for  the  ith  subset  of  data 
and  n:  equals  the  number  of  data  points  comprising  that  median. 
The  reason  for  this  is  that  if  x  is  N  (p,o)  then  the  median  is 
From  the  normal  table,  for  normal  distributions. 


N  ( )i ,  r*—a ) 

V  2n 

IQR  =  1.35o  thus 
S  = 


(-f) 


IQR 


y/rT  1.35 


1.25 

1.35 


V*H  • 


(17) 


2 


Since  the  variance  of  e^  =  and  since  s  is  an  estimate 

of  if  we  transform  the  e^'s  into  e^/s  the  variance  of  e;/Sj 

should  approximate  1.  The  variance  of  the  transformed  e/s  is 

_  * 

now  estimated  to  be  one  and  is  thus  approximately  constant. 
Accordingly,  the  predictor  will  have  more  neatly  constant 
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variance.  Therefore  this  assumption  of  ordinary  least  squares 
hold  and  OLS  regression  procedures  are  more  appropriately 
performed  on  the  transformed  model . 

D.  ANALYSIS  OF  COVARIANCE  MODEL 

The  results  of  using  a  weighted  least  squares  approach 
with  the  transformed  model ,  equation  (15),  indicated  that  this 
was  more  sensible  than  using  ordinary  least  squares,  however, 
another  approach  also  seemed  plausible.  Analysis  of  Covariance 
(ANCOVA)  was  used  in  which  the  grand  mean  rental  cost  is 
adjusted  within  each  group  of  paygrade,  number  of  bedrooms  and 
house  type  by  the  rental  cost  which  is  determined  by  these 
factors.  Thus  the  ANCOVA  model  would  become 

Yijk  =  X.B„  *  xi)tBi;l  +  eijk  <“> 

in  which  the  XQB0  term  is  the  grand  mean,  the  X^B;^  term  is  the 

total  pay  for  each  group  of  number  of  bedrooms  and  house  type. 

The  Y.jjj  term  would  represent  rental  cost  for  each  ith  person 

dimensioned  by  jth  type  of  house  and  the  kth  number  of  bedrooms 

in  the  house.  This  model  differs  from  the  previous  model  in 

that  instead  of  using  medians  of  total  pay  within  groups  of 

paygrade,  house  type,  bedrooms,  and  dependency  status  to 

predict  median  rent,  the  model  used  the  total  pay  of  each 

individual  person  in  a  costing  band  and  the  deviations  caused 

by  differences  in  house  type  and  number  of  bedrooms  to  predict 

rent.  Thus,  in  this  case,  total  pay  becomes  the  continuous 

variable  and  house  type  and  number  of  bedrooms  become  the 

categorical  term.  Paygrade  and  Dependency  status  were  not  used 

as  class  variables  in  this  model  since  total  pay  adequately 
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reflected  their  values.  Their  inclusion  would  cause 
col  linearity  to  exist  among  the  variables  and  the  regression 
estimates  would  then  be  biased. 

E.  CROSS  VALIDATION  TECHNIQUES 

Since  the  weighted  least  squares  approach  with  the  model 
(15)  and  the  ANCOVA  approach  (18)  using  all  the  data,  not  the 
median  data,  were  thought  to  be  the  most  sensible,  a  cross 
validation  technique  was  used  m  each  case  to  test  the 
parameter  estimates  and  the  models.  For  the  weighted  least 
squares  model  half  of  the  data  was  used  to  determine  regression 
coefficients  and  these  coefficients  were  then  used  with  the 
other  half  of  the  data  to  calculate  new  fitted  values.  These 
values  were  then  compared  to  the  actual  observed  values  to  find 
estimates  of  slope  and  intercept.  The  equation 

5  (Y;  -  Y:)2  (19) 
1  =  1  4  1 

is  the  residual  sum  of  squares.  These  values  for  sum  of  the 
squares  of  the  residuals  were  compared  for  each  half  of  the 
data  within  each  of  the  thirty  costing  bands  for  the  weighted 
least  squares  model.  For  the  ANCOVA  model,  no  provision  in  SAS 
was  available  for  the  above  described  cross  validation  so  the 
data  for  each  costing  band  was  randomly  divided  in  half  and  the 
parameter  estimates  of  the  coefficients  and  its  standard  error 
for  each  half  of  the  data  were  compared  (See  results  in 
Analysis  chapter). 
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III.  ANALYSIS 


A.  ANALYSIS  OF  CURRENT  MODEL 

The  current  model,  equation  (2),  was  run  using  OLS 
regression  procedures  with  the  data  from  the  thirty  costing 
bands,  fifteen  of  which  contained  data  as  specified  by  the  Per 
Diem  Committee  and  fifteen  which  contained  the  additional  data 
obtained  from  those  military  members  who  are  in  paygrades  E5 
and  above  and  who  share  their  residences.  The  results  of  the 
regression  analysis  indicated  that  this  model  was  suspicious 
in  that  it  did  not  adequately  fit  the  data,  and  would  therefore 
perhaps  not  produce  an  adequate  prediction  of  median  rent  based 
on  total  pay. 

Initially  the  current  model,  equation  (2),  was  run  using 
all  of  the  available  data  within  each  costing  band.  The  data 
was  plotted,  median  rent  versus  total  pay,  for  each  costing 
band.  A  spread  in  the  variance  of  the  data  was  seen  and  in 
some  instances  a  curve  was  present,  indicating  a  nonlinear, 
instead  of  linear  type  of  relationship  (See  Appendix  A).  The 
regression  analysis  results  as  seen  in  Table  1  (See  Appendix 
C)  showed  that  in  twenty-three  out  of  twenty-eight  cases  the 
model  had  a  significant  lack  of  fit.  (The  data  from  the  other 
two  costing  bands  contain  only  two  data  points  and  regression 
analysis  is  not  valid  in  these  two  cases.)  The  residual  plots 
from  each  of  these  regressions  also  exhibited  evidence  of 
nonconstant  variance  which  was  a  further  indication  that  the 
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model  was  inadequate.  (These  residual  plots  can  be  seen  in 
Appendix  A.)  The  regression  results  from  the  costing  bands 
which  did  not  exhibit  a  significant  lack  of  fit  did,  however, 
have  residuals  which  had  a  nonconstant  variance  and  were  not 
normally  distributed.  Also  the  R2  values  in  each  of  these 
cases  were  extremely  low  (less  than  .32)  which  again  served  as 
an  indication  that  the  model  only  explained  at  most  a  third  of 
the  variance. 

The  data  within  each  of  the  thirty  costing  bands  was  then 
broken  into  two  groups  according  to  dependency  status.  The 
"zero"  group  within  each  costing  band  contained  the  data  from 
those  military  members  who  had  dependents,  and  the  "one"  group 
contained  the  data  from  those  military  members  who  claimed  no 
dependents.  The  regression  model,  equation  (2),  was  run  again 
using  these  new  groupings  of  the  data.  The  results  of  the 
regression  analysis  again  indicated  that  this  model  was 
entirely  inappropriate.  Although  there  was  not  one  case  of 
significant  lack  of  fit,  the  residual  analysis  of  the  data,  as 
seen  in  Table  2  (Appendix  C) ,  from  twenty-six  out  of  twenty- 
eight  of  the  costing  bands,  illustrated  that  the  residuals  were 
not  normally  distributed.  The  residual  plots  (Appendix  A) 
again  show  nonconstant  variance.  Two  costing  bands,  the  "zero" 
labeled  data  from  both  costing  bands  510  and  512,  while 
indicating  that  the  residuals  were  normally  distributed  and  had 
constant  variance,  not  showing  significant  lack  of  fit,  and 
according  to  the  F  test  for  significance  of  the  regression 


24 


2 

exhibiting  evidence  of  a  significant  regression,  had  low  R 
values  of  less  than  .500  which  indicates  a  lot  of  unexplained 
variance.  In  this  instance,  with  the  data  broken  into  groups 
by  dependency  status,  the  model  again  was  inadequate. 

Next  the  data  within  each  of  the  thirty  costing  bands  was 
broken  into  groups  according  to  paygrade.  Paygrade  1  consisted 
of  the  data  from  military  members  who  are  in  paygrades  El  to 
E9.  Paygrade  2  consisted  of  the  data  from  military  members  who 
are  in  paygrades  W1-W4,  01E-03E,  and  01-03.  Paygrade  3 
consisted  of  the  data  from  military  members  in  paygrades  04- 
07.  Data  from  paygrades  08  and  above  are  included  in  the  data 
for  paygrade  07.  The  model,  equation  (2),  was  again  tested 
using  this  data.  With  the  data  from  the  costing  bands  broken 
into  groups  in  this  manner  there  were  84  different  cases  in 
which  the  model  was  tested.  In  fifty  out  of  eighty-four  cases, 
as  can  be  seen  in  Table  3  (Appendix  C) ,  a  significant  lack  of 
fit  was  found.  Of  those  thirty  four  cases  where  there  was  not 
a  significant  lack  of  fit,  twenty  eight  of  them  had  residuals 
which  were  not  normally  distributed  and  had  residual  plots 
which  showed  evidence  of  nonconstant  variance.  The  six  cases 
which  showed  no  evidence  of  lack  of  fit,  and  which  had 
residuals  which  were  normally  distributed,  namely  costing  band 
632  paygrade  3,  costing  band  530  paygrade  2,  costing  band  590 
paygrade  2,  costing  band  570  paygrade  3,  costing  band  650 
paygrade  3,  and  costing  band  510  paygrade  2,  all  had  R  values 
less  than  .330.  Thus  once  again  there  was  strong  evidence  that 
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even  in  this  case  where  the  data  was  broken  into  groupings 
according  to  paygrade  the  model  was  inadequate. 

To  further  ensure  that  the  model  was  tested  under  all 
appropriate  conditions,  the  data  was  broken  into  groups  first 
by  dependency  status  and  then  further  broken  into  groups  by 
paygrade.  Thus  the  data  from  each  costing  band  was  broken  into 
"zero"  or  "one"  groups  as  defined  previously.  The  "zero"  or 
"one"  groups  were  then  broken  into  further  groupings  according 
to  paygrade.  Thus  the  "zero"  group,  for  example,  was  broken 
into  three  further  groups,  paygrade  1,  paygrade  2,  and  paygrade 
3  also  as  previously  defined.  Therefore  each  of  the  twenty 
eight  costing  bands  now  has  two  dependency  status'  and  within 
each  dependency  status  three  paygrades  associated  with  it. 
Thus  the  model  was  tested  using  168  different  sets  of  data. 
The  results  of  the  regression  analysis,  using  each  of  these 
different  data  sets,  can  be  seen  in  Table  4  (Appendix  C) .  At 
an  alpha  level  of  .05  three  out  of  the  168  data  sets  showed 
significant  lack  of  fit.  Of  those  data  sets  which  did  not  show 
a  significant  lack  of  fit  105  had  residuals  which  were  not 
normally  distributed  and  which  had  residual  plots  which 
exhibited  nonconstant  variance.  Of  those  remaining  sixty  sets 
of  data  which  show  no  significant  lack  of  fit  and  normally 
distributed  residuals,  nineteen  of  them  did  not  have 
significant  overall  regressions  according  to  the  F  test  at  an 
alpha  level  of  .05.  Of  the  remaining  forty-one  data  sets  which 
did  not  show  significant  lack  of  fit,  which  had  normally 
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distributed  residuals  and  residual  plots  showing  constant 
variance  (Appendix  A)  ,  and  which  had  regressions  which  were 
significant  according  to  the  F  test,  all  had  R  values  which 
were  less  than  .440.  In  fact  all  but  four  of  these  remaining 
data  sets  had  R^  values  which  were  less  than  .220.  Thus  this 
analysis  indicates  once  again  that  the  original  model  was 
woefully  inadequate  and  that  in  none  of  the  cases  where  the 
data  was  broken  into  groups  according  to  dependency  status,  or 
by  paygrade,  or  by  a  combination  of  both  would  this  model 
adequately  predict  median  rent  based  on  total  pay.  An  adequate 
model  would  be  one  in  which  there  was  no  lack  of  fit,  the 
assumptions  of  Least  Squares  Regression  would  hold,  and  the  R 
values  would  be  high  indicating  that  the  model  explains  the 
variance  of  the  data. 

B.  ANALYSIS  OF  PROPOSED  MODEL 

The  proposed  model,  equation  (13),  was  tested  using  the 
same  data  from  the  thirty  costing  bands  as  was  used  to  test  the 
current  model,  equation  (2).  The  results  of  the  regression 
analysis  indicated  that  in  certain  cases  the  use  of  this  model 
may  be  more  adequate  in  predicting  median  rent  from  total  pay; 
however  it  must  be  used  with  caution. 

This  model,  equation  (13),  was  also  tested  using  the  same 
groupings  of  data  as  used  in  testing  the  current  model  , 
equation  (2).  Initially,  the  model  was  tested  using  all  of  the 
data  within  each  costing  band.  As  in  the  previous  model  median 
rent  versus  total  pay  was  plotted.  The  plots  indicated  an 
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increase  in  variance  but  indicated  a  strong  linear  relation¬ 
ship.  The  results  of  the  regression  analysis  showed  that  in 
all  twenty-eight  instances,  see  Table  5,  a  significant  lack  of 
fit  was  evidenced.  Next  the  data  within  each  costing  band  was 
broken  into  groups  by  dependency  status.  The  data  was  labeled 
with  a  zero  if  the  military  member  had  dependents  and  the  data 
was  labeled  with  a  one  if  the  military  member  had  no  dependents 
or  had  no  dependents  and  was  sharing  his  or  her  residence.  The 
plots  of  median  rent  versus  total  pay  for  each  costing  band 
indicated  an  even  stronger  linear  relationship  than  in  the 
original  plots  but  they  still  exhibited  evidence  of  unequal 
variance.  The  results  of  the  regression  analysis,  see  Table 
6,  showed  that  in  eight  out  of  fifty-six  cases  a  significant 
lack  of  fit  was  evidenced.  Of  the  remaining  forty-eight  cases 
twelve  of  these  had  residuals  which  were  not  normally 
distributed.  The  residual  plots  of  these  data  sets  showed  that 
nonconstant  variance  was  present.  The  residual  plots  of  the 
thirty-six  cases  which  did  not  have  significant  lack  of  fit, 
which  had  residuals  which  were  normally  distributed,  and  which 
were  significant  regressions  at  the  alpha  level  .05,  also 
showed  some  evidence  of  nonconstant  variance.  Also,  the  R 
values  were  in  the  .4  to  .5  range  with  the  highest  a  value  of 
.55.  These  R  values  are  lower  than  the  ones  obtained  with  the 
use  of  the  Weighted  Least  Squares  model  ,  seen  in  the  next 
section,  whose  purpose  is  to  reduce  or  eliminate  the 
nonconstant  variance  of  the  residuals.  Thus  prediction  was 
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worse  for  residuals  with  more  variance.  See  Appendix  A.  The 
data  within  each  costing  band  was  next  broken  into  groups  by 
paygrade.  This  procedure  was  the  same  as  the  one  used  in 
testing  the  current  model,  paygrade  1  reflected  paygrades  El- 
E9,  paygrade  2  reflected  paygrades  W1-W4,  01E-03E,  and  01-03, 
and  paygrade  3  reflected  paygrades  04-07  with  paygrades  08- 
010  included  in  paygrade  07.  When  the  data  was  broken  into 
these  groups  there  were  many  more,  fifty-six  out  of  eighty- 
four,  see  Table  7  (Appendix  C),  cases  of  significant  lack  of 
fit.  Also  because  of  few  data  points  within  each  group,  the 
overall  regressions  in  many  instances  were  not  significant. 
Finally  the  data  was  broken  into  groups  first  by  dependency 
status  and  then  by  paygrade.  The  results  of  the  regression 
analysis  indicated  that  while  there  were  only  eight  cases  of 
significant  lack  of  fit,  see  Table  8  (Appendix  C) ,  out  of  one 
hundred  and  sixty-eight,  thirty  had  residuals  which  were  not 
normally  distributed  and  because  of  few  data  points  within  each 
group,  some  of  the  data  sets  did  not  have  significant 
regressions,  at  the  .05  alpha  level.  Of  the  regressions  on  the 
data  sets  which  did  fulfill  all  of  the  criteria  the  R3  values 
were  low.  Thus  the  model  best  predicted  median  rents  from  total 
pay  when  the  data  was  divided  by  dependency  status,  however, 
this  model  must  be  viewed  as  possibly  inaccurate  since  the 
residual  plots  indicated  evidence  of  nonconstant  variance,  and 
a  better  model  would  predict  points  in  an  unbiased  fashion. 
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C.  ANALYSIS  OF  WEIGHTED  LEAST  SQUARES  MODEL 

Analysis  of  the  Weighted  Least  Squares  Model,  equation 
(15),  with  Y-  =  median  rent  and  X^  =  total  pay  for  the  ith 
group,  was  conducted  in  the  same  manner  as  that  of  the  current 
model,  equation  (2),  and  that  of  the  proposed  model,  equation 
(13).  The  only  difference  here  was  that  initially  the  data 
were  randomly  divided  into  two  sections  in  order  to  use  cross 
validation  procedures  to  compare  the  sum  of  the  squares  of  the 
residuals  of  the  first  division  of  data  to  the  sum  of  the 
squares  of  the  errors  of  the  second  division  of  data  in  which 
the  parameter  estimates  from  the  first  set  of  data  were  used 
to  compute  the  predicted  values  for  the  second  set  of  data. 
Thus  the  Weighted  Least  Squares  model  was  first  tested  using 
one  half  of  all  of  the  data  available  within  each  costing  band, 
next  the  model  was  tested  by  the  half  of  the  data  which  had 
been  divided  into  groups  by  dependency  status,  then  the  model 
was  tested  by  the  half  of  the  data  which  had  been  broken  into 
groups  by  paygrade  within  each  costing  band,  and  finally  the 
model  was  tested  with  half  of  the  data  which  had  been  broken 
first  into  groups  according  to  dependency  status  and  then  by 
paygrade . 

The  results  of  the  regression  analysis  using  half  of  all 
of  the  data  within  each  costing  band  showed  (see  Table  9, 
Appendix  C)  that  a  significant  lack  of  fit  existed  for  each 
costing  band.  When  the  data  was  broken  into  divisions  by 
dependency  status  the  regression  analysis  results,  see  Table 
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10  (Appendix  C) ,  showed  that  seventeen  out  of  fifty-six  cases 
exhibited  significant  lack  of  fit  and  that  nine  out  of  the 
thirty  nine  remaining  cases  did  not  have  normally  distributed 
residuals.  Three  out  of  the  remaining  thirty  cases  did  not 
have  regressions  which  were  significant  overall  and  of  the 
remaining  twenty  seven  cases  in  which  all  statistical  criteria 
were  met,  the  R  values  were  typically  between  .44  and  .75. 
There  was  no  evidence  of  nonconstant  variance  in  the  residual 
plots  and  they  seemed  to  appear  to  have  been  normally 
distributed  in  most  cases. 

When  the  data  was  broken  into  groups  by  paygrade,  only 
twenty-five  out  of  a  possible  eighty  four  cases,  see  Table  11 
(Appendix  C)  ,  met  all  of  the  criteria  of  successful  regression 
in  that  they  did  not  have  significant  lack  of  fit,  their 
residuals  were  normally  distributed,  and  their  regressions  were 
significant  at  the  .05  alpha  level.  The  R  values,  however, 
ranged  from  very  low  to  a  high  of  .73.  Again  the  residual 
plots  appeared  to  indicate  a  fairly  normal  distribution  with 
little  evidence  of  nonconstant  variance. 

The  results  of  the  regression  analysis,  when  the  data  was 
broken  into  groups  both  according  to  dependency  status  and 
paygrade,  see  Table  12,  showed  that  better  than  half,  93  out 
of  168,  met  the  criteria  for  a  successful  regression  and  had 
R  values  ranging  mostly  between  .4  and  .65.  There  were 
however,  very  few  data  points  in  some  categories,  thus  these 
results  must  be  viewed  with  suspicion.  The  statistics  for  lack 
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of  fit,  normality  of  the  residuals,  and  overall  significance 
of  the  regression  all  might  have  been  affected  by  this  small 
number  of  data  points.  Therefore  this  model  using  a  weighted 
least  squares  approach,  equation  (15),  performed  best  when  the 
data  within  each  costing  band  was  divided  according  to 
dependency  status. 

The  cross  validation  technique  used  here  proved  to  be 
unsuccessful  since  only  the  sum  of  squares  of  the  residuals 
(SSR)  term  were  compared,  see  Table  13  (Appendix  C) ,  in  the 
case  where  all  of  the  data  was  used  within  each  costing  band. 
The  differences  between  the  SSR  for  the  first  group  of  data  and 
the  data  with  predicted  values  found  by  employing  the  parameter 
estimates  from  the  first  set  of  data  for  each  costing  band  were 
quite  large.  This  could  be  due  to  the  lack  of  fit  which  was 
found  or  due  to  the  fact  that  the  second  group  generally  had 
several  more  data  points  than  the  first  group.  Either  of  these 
two  factors  or  a  combination  of  both  might  have  accounted  for 
these  tremendous  differences. 

D.  ANALYSIS  OF  THE  ANALYSIS  OF  COVARIANCE  MODEL 

The  results  of  the  regression  analysis  on  the  ANCOVA  model 
indicated  that  this  model  may  be  the  best  model  discussed  thus 
far  for  use  in  predicting  rent  based  on  total  pay  (see  Table 
14,  Appendix  C) .  All  of  the  regressions  were  significant  and 
had  R  values  ranging  from  .42  to  .58  with  few  values  above  or 
below  these  numbers.  The  residu?!  plots,  normal  plots,  and 
stem  and  leaf  diagrams  indicated  that  the  residuals  were 
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normally  distributed  (See  Appendix  C)  .  The  significance  levels 
of  the  normal  statistic  used  to  test  the  normality  of  the 
residuals,  however,  did  not,  in  most  cases,  indicate  that  the 
residuals  were  normally  distributed.  However  the  residuals 
were  fairly  symmetric  and  the  sample  size  was  quite  large, 
therefore  the  model  should  be  fairly  robust  to  the  lack  of 
normal  fit.  The  residual  plots  showed  the  fairly  typical  box¬ 
like  pattern  of  randomly  distributed  data.  The  stem  and  leaf 
and  normal  plots  supported  a  fairly  good  defense  for  the 
normality  of  the  residuals. 

In  the  case  of  several  of  the  costing  bands  there  did  not 
appear  to  be  a  significant  difference  in  the  least  squares 
means  of  the  rent  pertaining  to  different  house  types  and 
different  number  of  bedrooms.  This  was  particularly  true 
between  house  types  1  and  2  (single  family  home  and  townhouse) 
and  also  between  house  types  3  and  4  (apartment  or  mobile 
homes).  In  some  costing  bands  there  also  appeared  to  be  no 
significant  difference  between  the  least  square  means  of  rent 
predominantly  in  the  case  between  3  and  4  bedrooms  and  less 
predominantly  with  1  and  2  numbers  of  bedrooms.  This 
indicates,  that,  when  there  is  not  a  significant  difference 
between  the  least  squares  means  between  two  different  types  of 
housing  or  two  residences  with  different  numbers  of  bedrooms, 
either  of  the  parameter  estimates  of  two  types  of  housing  or 
number  of  bedrooms  may  be  used  to  predict  rent.  Thus  the 
ANCOVA  model  which  predicted  rent  based  on  the  total  pay 
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associated  with  number  of  bedrooms  and  house  type  may  not  have 
been  completely  correct  in  these  cases  since  the  mean  amount 
of  rent  associated  with  each  type  of  house  or  number  of 
bedrooms  may  not  have  been  different. 

The  cross  validation  technique  used  here,  since  GLM  does 
not  provide  a  vehicle  to  compute  the  Sum  of  Squares  of  the 
Residuals  from  previously  calculated  parameter  estimates,  was 
one  in  which  the  data  was  randomly  divided  into  two  sections 
and  after  the  ANCOVA  model  was  run  on  both  sets  of  data,  the 
coefficient  of  the  slope  parameter  estimate  and  its  standard 
error  were  compared.  A  comparison  of  the  slope  parameter  and 
its  standard  error  between  the  two  sections  of  data  from  each 
costing  band  revealed  that  the  model  was  not  at  serious  fault 
since  in  both  of  the  sections  of  the  data  the  slope  parameter 
estimates  were  very  close  and  the  standard  errors  were  small 
and  similar  (See  Table  14). 
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IV.  CONCLUSIONS  AND  RECOMMENDATIONS 


The  purpose  of  this  thesis  was  to  test  and  validate  the 
current  model,  equation  (2),  to  see  if  it  could  effectively  be 
used  to  predict  rent  based  on  total  pay  from  the  survey  data 
which  had  been  arranged  in  a  newly  devised,  simplified  format. 
If  the  current  model  was  deemed  invalid  or  suspicious,  then  the 
second  purpose  of  this  thesis,  was  to  propose  a  better,  more 
sensible  model  which  would  adequately  predict  rent  based  on 
total  pay. 

There  are  two  major  conclusions  from  the  analysis  contained 
in  this  thesis.  The  first  conclusion  is  that  the  current 
model,  equation  (2),  should  not  be  used  to  predict  median  rents 
in  each  paygrade  and  dependency  status  when  the  data  is  divided 
into  costing  bands  in  the  manner  previously  described.  This 
conclusion  is  justified  by  the  results  of  the  regression 
analysis  which  show  that  this  model  is  inadequate  and  may  not 
accurately  predict  median  rent.  The  second  conclusion  is  that 
both  the  weighted  least  squares  model  and  the  ANCOVA  model  are 
possible  alternative  models  for  use  in  predicting  rent  based 
on  total  pay.  They  are  shown  to  be  at  least  as  reasonable  as 
the  current  model,  if  not  better.  The  ANCOVA  model  may  be 
preferable  for  predicting  mean  rather  than  a  median  rent.  Also 
the  ANCOVA  model  may  be  preferable  if  the  model  is  used  to 
determine  owner  equivalency  rents.  If  a  median  rent  figure 
must  be  used  in  the  congressional  1 y  mandated  formula  for  the 
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computation  of  VHA  the  weighted  least  squares  model  is 
preferable . 

The  secondary  purpose  of  this  thesis  was  to  determine  if 

the  data  from  military  personnel  in  paygrades  E5  and  above  who 

share  housing  should  be  used  or  discarded  since  these  data  had 

been  previously  discarded  on  the  basis  of  a  policy  decision 

without  any  statistical  backing.  Curiously  enough,  there  seems 

to  be  no  systematic  difference  across  all  of  the  models 

investigated  in  relation  to  the  addition  of  this  data.  In  some 

instances  when  regression  analysis  results  from  the  same  two 

costing  bands,  one  which  contained  the  additional  data  and  one 

which  did  not  contain  the  additional  data,  were  compared,  lack 

of  fit  was  affected.  Also  in  some  cases  the  significance  of 

•) 

the  regression  would  be  affected,  or  in  some  cases  the  R 
values  would  go  up  or  down.  Thus  there  was  no  instance  in 
which,  for  example,  all  of  the  R  values  would  go  up  or  all  of 
the  significance  of  regression  statistics  would  suddenly 
increase  or  decrease  for  a  certain  model .  The  important 
consideration  here  was  that  the  additional  data  did  affect  R" 
values;  it  did  affect  the  lack  of  fit,  significance  value 
statistics,  and  the  normality  of  residuals.  Thus  while  the 
additional  data  did  not  have  a  systematic  effect,  it  did  have 
an  effect  and  this  aspect  should  not  go  overlooked  when  a 
decision  is  made  whether  or  rot  to  include  these  data  when  VHA 
rates  are  actually  calculated. 
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There  are  several  recommendations  for  further  analysis. 
First,  the  way  in  which  the  data  is  broken  into  costing  bands 
must  be  investigated.  Perhaps  a  better  method  or  a  different 
dollar  figure  could  be  used  to  divide  the  data  into  costing 
bands.  If  a  different  method  is  used  and  the  data  contained 
in  each  costing  band  is  different,  analysis  of  each  of  the 
regression  models  discussed  in  this  paper  must  be  redone.  If 
the  data  is  put  into  different  costing  bands  other  than  the 
ones  used  in  this  thesis,  the  models  discussed  may  be  more  or 
less  accurate  predictors  of  median  rent.  In  either  case  the 
original  data  must  be  investigated  and  natural  breaks  in  the 
data  must  be  discovered  in  order  to  achieve  the  best  placement 
of  data  into  costing  bands.  A  second  area  which  requires 
further  analysis  concerns  the  ANCOVA  model.  The  data,  before 
testing  the  ANCOVA  model,  should  be  divided  into  groups  either 
by  dependency  status  or  by  paygrade.  A  better  fit  of  the 
regression  model  may  be  accomplished  in  either  case.  Other 
models  should  also  be  investigated  as  possible  solutions  to  the 
problem.  Perhaps  instead  of  the  weighted  least  squares, 
another  transformation  on  the  data  could  be  devised  which  may 
provide  a  better  model .  Since  there  is  an  indication  of  non¬ 
normal  errors,  perhaps  GLIM  (Generalized  Linear  Models)  could 
be  used  for  more  accurate  prediction  [Ref.  4].  Further 
Analysis  and  other  models  should  still  be  investigated  as 
possible  predictors  of  median  rents  for  the  VHA. 
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A. 


APPENDIX  A.  SCATTER  AND  RESIDUAL  PLOTS 


USING  DATA  SET  540  AS  AN  EXAMPLE,  SCATTER  AND  RESIDUAL  PLOTS 
FOR  THE  CURRENT  MODEL. 
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Figure  1 . 


Data  Set  540  1/Median  Rent  vs.  1/Total  Pay. 
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Figure  2.  Data  Set  540.  Residuals  vs.  Predicted  Values 
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Figure  3.  Data  Set  540. 
Dependency  Status  'O'. 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  4.  Data  Set  540. 
Dependency  Status  '1'. 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  5.  Data  Set  540. 
Dependency  Status  'O'. 
Residuals  vs.  Predicted  Values. 
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Figure  6.  Data  Set  540. 
Dependency  Status  '1'. 
Residuals  vs.  Predicted  Values, 
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Figure  7.  Data  Set  540. 
Paygrade  ’ 1 ' . 

1/Median  Rent  vs.  1/Total  Pay. 
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Figure  8.  Data  Set  540. 
Paygrade  '  2 '  . 

1/Median  Rent  vs.  1/Total  Pay. 
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Figure  9.  Data  Set  540. 
Paygrade  ' 3 ' . 

1/Median  Rent  vs.  1/Total  Pay. 
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Figure  10.  Data  Set  540. 
Paygrade  ' 1 ' . 

Residuals  vs.  Predicted  Values. 
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Figure  12.  Data  Set  540. 
Paygrade  '3'. 

Residuals  vs.  Predicted  Values, 
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Figure  13.  Data  Set  540. 
Dependency  Status  'O’  and  Paygrade 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  14.  Data  Set  540. 
Dependency  Status  ’O’  and  Paygrade  ’2’. 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  15.  Data  Set  540. 
Dependency  Status  'O'  and  Paygrade  '3'. 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  16.  Data  Set  540. 
Dependency  Status  'O'  and  Paygrade  ' 1'. 
Residuals  vs.  Predicted  Values. 
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Figure  17. 
Dependency  Status 
Residuals  vs. 


Data  Set  540. 

'  0 ’  and  Paygrade  ’ 2  ’  . 
Predicted  Values. 
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Figure  18.  Data  Set  540. 
Dependency  Status  'O’  and  Paygrade  '3'. 
Residuals  vs.  Predicted  Values. 
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Figure  19.  Data  Set  540. 
Dependency  Status  '1*  and  Paygrade 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  20.  Data  Set  540. 
Dependency  Status  '1'  and  Paygrade  '2'. 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  21.  Data  Set  540. 
Dependency  Status  '1*  and  Paygrade  '3 
1/Median  Rent  vs.  1/Total  Pay. 
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Figure  22.  Data  Set  540. 
Dependency  Status  '1'  and  Paygrade  '1'. 
Residuals  vs.  Predicted  Values. 
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Figure  23. 
Dependency  Status 
Residuals  vs. 


Data  Set  540. 

' 1 1  and  Paygrade 
Predicted  Values. 


2 '  . 
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Figure  24.  Data  Set  540. 
Dependency  Status  ’1’  and  Paygrade  ’3’. 
Residuals  vs.  Predicted  Values. 
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B.  USING  DATA  SET  540  AS  AN  EXAMPLE  SCATTER  PLOTS  AND  RESIDUAL 
PLOTS  FOR  THE  PROPOSED  MODEL. 
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Figure  25.  Data  Set  540.  Median  Rent  vs.  Total  Pay 
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Figure  26. 


Data  Set  540. 


Residuals  vs. 


Predicted  Values 
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Figure  27.  Data  Set  540. 

Dependency  Status  'O'. 
Median  Rent  vs.  Total  Pay. 
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Figure  28.  Data  Set  540. 

Dependency  Status  '1'. 
Median  Rent  vs.  Total  Pay. 
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Figure  29.  Data  Set  540. 
Dependency  Status  'O'. 
Residuals  vs.  Predicted  Values. 
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Figure  30.  Data  Set  540. 
Dependency  Status  '1'. 
Residuals  vs.  Predicted  Values. 
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Figure  31.  Data  Set  540. 
Paygrade  ' 1 ' . 

Median  Rent  vs.  Total  Pay. 
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Figure  32.  Data  Set  540. 
Paygrade  '2'. 

Median  Rent  vs.  Total  Pay. 
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Figure  33.  Data  Set  540. 
Paygrade  ' 3 ' . 

Median  Rent  vs.  Total  Pay. 
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Figure  34.  Data  Set  540. 
Paygrade  ’ 1 ’ . 

Residuals  vs.  Predicted  Values. 
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Figure  35.  Data  Set  540. 
Paygrade  ’ 2  * . 

Residuals  vs.  Predicted  Values. 
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Figure  36.  Data  Set  540. 
Paygrade  '3'. 

Residuals  vs.  Predicted  Values. 
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Figure  37.  Data  Set  540. 
Dependency  Status  'O’  and  Paygrade  ’ 1 ' . 
Median  Rent  vs.  Total  Pay. 
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Figure  38.  Data  Set  540. 

Dependency  Status  'O'  and  Dependency  Status  '2' 
Median  Rent  vs.  Total  Pay. 
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Figure  39.  Data  Set  540. 
Dependency  Status  ’O'  and  Paygrade  '3'. 
Median  Rent  vs.  Total  Pay. 
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Figure  40.  Data  Set  540. 
Dependency  Status  'O'  and  Paygrade  ’ 1 ’ . 
Residuals  vs.  Predicted  Values. 
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Figure  41.  Data  Set  540. 
Dependency  Status 'O'  and  Paygrade  '2'. 
Residuals  vs.  Predicted  Values. 
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Figure  42.  Data  Set  540. 
Dependency  Status  'O’  and  Paygrade  '3'. 
Residuals  vs.  Predicted  Values. 
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Figure  43.  Data  Set  540. 
Dependency  Status  ' 1'  and  Paygrade  '1'. 
Median  Rent  vs.  Total  Pay. 


80 


'10  • 

r:o  • 

690  - 

66fl  * 

630  • 
*30  • 


s:o  • 

m 


0  310  • 

5  -80  • 

s 

-50  • 


-20  • 

390  • 

360  • 

330  ■ 

300  • 

270  • 

'iooo 


1200 


?13T  OF  MCCST*TDTP 


VSHR’l  PC* 
-I3ENC.  A 


I -00 


1600  1800  2000  2200  2600 

totp 


A 

A 

A 

A 


Figure  44.  Data  Set  540. 
Dependency  Status  ' 1 '  and  Paygrade  ' 2 ' . 
Median  Rent  vs.  Total  Pay. 
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Figure  45.  Data  Set  540. 
Dependency  Status  ' l •  and  Paygrade  '3'. 
Median  Rent  vs.  Total  Pay. 
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Figure  46.  Data  Set  540. 
Dependency  Status  '  l'  and  Paygrade  *1'. 
Residuals  vs.  Predicted  Values. 
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Figure  47. 
Dependency  Status 
Residuals  vs. 


Data  Set  540. 

' 1'  and  Paygrade 
Predicted  Values. 
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Figure  48.  Data  Set  540. 
Dependency  Status  * l '  and  Paygrade  '3'. 
Residuals  vs.  Predicted  Values. 
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c.  USING  DATA  SET  540  AS  AN  EXAMPLE,  SCATTER  PLOTS  AND  RESIDUAL 
PLOTS  FOR  THE  WEIGHTED  LEAST  SQUARES  MODEL. 
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Data  Set  540.  Residuals  vs.  Predicted  Values. 
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Figure  50.  Data  Set  540. 

Dependency  Status  'O'. 
Median  Rent  vs.  Total  Pay. 
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Figure  51.  Data  Set  540. 

Dependency  Status  'l*. 
Median  Rent  vs.  Total  Pay. 
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Figure  52.  Data  Set  540. 
Dependency  Status  'O'. 
Residuals  vs.  Predicted  Values. 
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Figure  53.  Data  Set  540. 
Dependency  Status  • l * . 
Residuals  vs.  Predicted  Values. 
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Median  Rent  vs.  Total  Pay. 
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Figure  55.  Data  Set  540. 
Paygrade  ' 2 ' . 

Median  Rent  vs.  Total  Pay. 


92 


93 


CF  P-ESIS^CSTHT  LEGEND 


:  :bs.  s  =  :  cbs. 


360  380  -00  620  —0  -60 

FUSicrtD  value 


Figure  57.  Data  Set  540. 
Paygrade  ' 1 ' . 

Residuals  vs.  Predicted  Values, 
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Figure  58.  Data  Set  540. 
Paygrade  ' 2 ' . 

Residuals  vs.  Predicted  Values. 
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Figure  59.  Data  Set  540. 
Paygrade  ' 3 ' . 

Residuals  vs.  Predicted  Values. 
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Figure  60.  Data  Set  540. 
Dependency  Status  ’O'  and  Paygrade  ' 1'. 
Median  Rent  vs.  Total  Pay. 
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Figure  61.  Data  Set  540. 
Dependency  Status  'O'  and  Dependency  Status  ' 
Median  Rent  vs.  Total  Pay. 
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Figure  62.  Data  Set  540. 
Dependency  Status  'O'  and  Paygrade  '3'. 
Median  Rent  vs.  Total  Pay. 
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Figure  63.  Data  Set  540. 
Dependency  Status  'O'  and  Paygrade  ' l ' . 
Residuals  vs.  Predicted  Values. 
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Figure  64.  Data  Set  540. 
Dependency  Status  '0*  and  Paygrade  ’2'. 
Residuals  vs.  Predicted  Values. 
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Figure  65.  Data  Set  540. 
Dependency  Status  ’0*  and  Paygrade  *3 
Residuals  vs.  Predicted  Values. 
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Figure  68.  Data  Set  540. 
Dependency  Status  '1'  and  Paygrade  '3'. 
Median  Rent  vs.  Total  Pay. 
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Figure  69.  Data  set  540. 
Dependency  Status  *1'  and  Faygrade  '1'. 
Residuals  vs.  Predicted  Values. 
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Figure  70.  Data  Set  540. 
Dependency  Status  ' 1 •  and  Paygrade  '2'. 
Residuals  vs.  Predicted  Values. 
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Figure  71.  Data  Set  540. 
Dependency  Status  ' 1  *  and  Paygrade  '3'. 
Residuals  vs.  Predicted  Values. 
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D. 


USING  DATA  SET  540  AS  AN  EXAMPLE,  STEM  AND  LEAP,  NORMAL  PLOTS, 
AND  RESIDUAL  PLOTS  FOR  THE  ANCOVA  MODEL. 
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Figure  72. 


Data  Set  540. 


Residuals  vs. 


Predicted  Values 
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HISTOGRAM 


Figure  73. 
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Data  Set  540.  Stem  and  Leaf  and  Normal  Plots. 
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APPENDIX  B.  SAS  PROGRAM  EXAMPLE 


/ / £XT4  JOB  (  1668.9999  )  .'WILLIAMS'  ,CLASS=G 
//••■MAIN  SYSTEM=sf2,LINfcs=(  99)  ,CARDS=(  500) 

//  EXEC  SAS 

//WORK  DD  SPACE=( CYL,(20,2 ) ) 

//DATAIN  DD  DISP  =  SHR,  DSN  =  Mi,'4W.  DPDVHA.  EDITSR.  CCG45.  M540 
//DATAOUT  DI?  DISP=(  OLD, KEEP  )  ,DSN=MSS.  S1668.  EXT 
//SYSIN  DD  * 

DATA  DATA540 ; 

INFILE  DATAIR: 

INPUT  PG  18-19  NSHR  20-21  HT  22-23  BR  24-25  RO  26-27  COST  30-33 
El  34  E2  35; 

BW1=269 ; 

BW2=269 
BW3  =  282  ; 

BW4=304 
BW  5  =  3  4  9  ; 

BW6  =  388 ; 

BW7=420 ; 

BW8=452 ; 

BW9=49 1 ; 

BW10  =  3  7  3 ; 

BW 1 1  =  4  3 1 ; 

BW) 2=469 ; 

BW  13  =  511 ; 

BW14=428 
BW15=463 ; 

BV7 16  =  513  ; 

BW1 7  =  365  ; 

B  1718  =  408 
BW1 9  =  4  78 
BW20=578 
BV.’2 1  =  655 
BW22  =  680 ; 

BW2  3  =  755  ; 

BW01= 150 ; 

BW02= 169 ; 

BW03=208 ; 

BW04=212 
BW05  =  244  ; 

BW06  =  264; 

BW07=292 
BW08  =  342 
BW09  =  3  7  2  i 
BWO 10= 28 3 ; 

BW01 1=  338 ; 

BWO 1 2  =  3  8 1 ; 

BWO 13=453; 

BW014=318 
BWO 15=370; 

BW016=434; 

BW017=269 

BWO)8=319; 

BWO 19 =402 ; 

BW020=  502  ; 

BW02 1=542 
EW022  =  562 
BW02  3  =  613; 

TP  1  =  1054 ; 

TP2  =  1 1 78 ; 

TP3=1238 
TP4= 1396 
TP5  = 1 6  3 1 ; 

TP6= 19  14 ; 

TP7  =  22  38 
TP8=2590; 

TP9=3072 ■ 

TP10=  2009 ; 

TP11=24 12 ; 

TP 12=2811; 

1  P  1  3  =  3  32  1 
TP 14  =  2281  ; 

TP15=275S ; 
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TP16=33^3; 

TP  17  =  18 1 5 ; 

TP  18  =  23  94 ; 

TP19=2966 
TP20  =  3628 ; 

TP2 1=432 1 ; 

TP22=5179 
TP23  =  65 17  ! 

IF  El  EQ  i  OR  E2  EC 

IF  El  EQ  7  OR  E2  EC 

IF  El  GE  8  OR  E2  G£ 


2  THEN  DELETE; 

7  THEN  DELETE; 

8  THEN  DELETE; 


IF  NSHR  GT  2  THEN  DELETE; 

IF  NSHR  EO  2  AND  PC  GT  4  THEN  DELETE; 
IF  RO  EQ  2  THEN  DELETE; 

IF  COST  LT  1  THEN  COST  =  1; 

ICOST= 1/COST; 

DATA  DATA540; 

SET  DATA540; 


ARRAY  BWf 
ARRAY  BWC 


BW1-BW23; 

)  BWOl - BW023 ; 


ARRAY  TP( 23 )  TP1-TP23; 
DO  I  =  1  TO  23; 


IF  PG  EQ  I  AND  NSHR  EQ  0  THEN  DO; 
BAQ=  BW(I): 

PAY  =  TP(i); 

TTP  =  TP(I)  -  BAQ; 

TOTP  =  TTP  *  BAQ; 

ITOTP  =  1/TOTP ; 

END; 

ELSt  * 

IF  Pd  EQ  I  AND  NSHR  NE  0  THEN  DO; 


BAQ  =  BWO(I). 

PAY  =  TF(I); 

TTP  =  PAY  -  BW( I ) ; 
TOTP  =  BAQ  *  TTP; 
ITOTP  =  1/TOTP ; 


TOTP  =  BAQ  *  TTP; 

ITOTP  =  1/TOTP; 

END; 

END; 

DATA  DATA540;  , 

SET  DATA540; 

FROC  SORT  DATA  =  DATA540; 

BY  PG  NSHR  HT  B&  COST  ICOST  ITOTP  TOTP; 

DATA  DATAOUT. DATA540; 

SET  DATA540 ■ 

KEEP  PG  NSliA  HT  BR  COST  ICOST  ITOTP  TOTP; 

PROC  UNIVARIATE  DATA=DATA540  NOPRINT; 

VAR  COST  ICOST; 

BY  PG  NSHR  HT  6r  ITOTP  TOTP; 

OUTPUT  OUT=DATA541 
MEDIAN=MCOST 
MEDIAN= IMCOST 
N  =  NUMB ; 

DATA  DATAOUT. DATA541; 

SET  DATA54 1  * 

KEEP  PG  NSH?.  HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  PLOT  DATA=  PATA54 1 ; 

PLOT  MCOST'-'TOTr : 

PLOT  1MCOST-ITOTP ; 

PROC  UNIVARIATE  DATA=DATA541  PLOT  NORMAL; 

VAR  MCOST; 

TROC  UNIVARIATE  DATA=DATA541  PLOT  NORMAL; 

VAP  IMCOST; 

PROC  REG  DATA=DATA54 1  SIMPLE; 

MODEL  MCOST=TOTP; 

OUTPUT  OUT=DATA546 
P=MCSTHT 
R=RESID; 

MODEL  IMCOST= ITOTP; 

OUTFUT  OUT=DATA547 
P=IMCSTHT 
R=RESID ; 

PROC  PLOT  DATA=  DATA546 ; 
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PLOT  RESID"TOTP/VREF=0 ; 

PLOT  RESID'-MCS THT / VR£F=0 ; 

PROC  PLOT  DATA=pATA547 ■ 

PLOT  P.ESID^ITOTP  /^REF  =  0 ; 

PLOT  RES1D-'IMCST(IT/VREF  =  0  ; 

PROC  UNIVARIATE  DATA  =  DATA546  PLOT  NORMAL; 

VAR  RESID; 

PROC  UNIVARIATE  DATA=DATA547  PLOT  NORMAL; 

VAR  RESID; 

PROC  SOpT  DATA  =  DATA541  OUT=DATA54 1A ; 

BY  TOTP ; 

DATA  DATAOUT. DATA541A; 

SET  DA T A 5 4 1 A ; 

KEEP  PG  NSHR  HT  BR  MCOST  IMCOST  ITOTP  TOTP; 

PROC  P.SREG  DATA  =  DATA 5 4  1 A  ; 

MODEL  MCOST  =  TUTP / LACKFIT ; 

PROC  SORT  DATA  =  DATA541  OUT= DATA54 IB ; 

BY  ITOTP; 

DATA  DATAOUT.  DATA541B; 

SET  DATA541B; 

KEEP  FG  NSHR  HT  BP  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  RSRF.G  DATA  =  DATA54  IB  ; 

MODEL  IMCOST^ ITOTP/ LACKFIT ; 

DATA  DATA541C; 

SET  D A r A 5 4 1 : 

IF  NSHR  CT  1  THEN  NSHR=1; 

DATA  DATAOUT. DATA541C; 

SET  DATA541C; 

KEEP  PG  NSHR  HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  SORT  DATA  =  DATA 5 4  1C  OUT- DATA54 ID ; 

BY  NSHR; 

DATA  DATAOUT.  DATA54 ID; 

SET  DATA541D; 

KEEP  FG  NSHR  HT  PR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  PLOT  DATA  -  PAT A 54  ID ; 

PLOT  MCOST'- TOTP ; 

BY  NSHR ; 

PROC  PLOT  DATA = DATA 54  ID ; 

PLOT  IMCOST" ITOTP ; 

BY  NSHR; 

PROC  UNIVARIATE  DATA=DATA541D  PLOT  NORMAL; 

VAR  MCOST; 

BY  NSHR; 

PROC  UNIVARIATE  DATA=DATA54 ID  PLOT  NORMAL; 

VAR  IMCOST; 

BY  NSHR; 

PROC  REG  DATA  =  DATA54 ID  SIMPLE; 

MODEL  MCOST  =  TOTF ; 

OUTFUT  OUT  =  DATA546D 
P=MCSIHT 
F.^RESID; 

BY  NSHR: 

PROC  REG  DATA- DATA54 ID  SIMPLE; 

MODEL  IMCOST - 1 IOTP ; 

OUTPUT  OUT--DAIA54  7D 
P= IMCSTHT 
R  =  P.ESID; 

BY  NSHR; 

PROC  PLOT  DAT  A  =  PAT  A  5  4  6  D; 

FLOT  RESID- TOTI'/VREF  =  0; 

BY  NSHR; 

PROC  PLOT  DATA- DATA 54  6 D ; 

PLOT  RESID  MCSIHT/VREF=0; 

BY  NSHR: 

PROC  PLOT  DATA=pATA547D • 

PLOT  RESID"ITOTF/ VREF-0 ; 

BY  NSHR; 

PROC  PLOT  DATA-PATA547D; 

PLOT  RESID'IMCSTHT/ VP.EF  =  0 ; 

BY  NSHR: 

FROC  UNIVARIATE  DATA- DATA546D  FLOT  NORMAL, 

VAR  RESID; 


BY  NSHR j 

PROC  UNIVARIATE  DATA-DATA547D  PLOT  NORMAL; 

VAR  RES1D; 

BY  NSHR; 

PROC  SORT  DATA  =  PATA541P  OUT=PATA54 IE ; 

BY  NSHR  TOTP; 

DATA  DATAOUT. DATA541E; 

SET  DATA541E' 

KEEP  PG  NSHR'HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  RSREG  PATA=DATA54 IE ; 

MODEL  MCOST=TOTP/LACKFIT ; 

BY  NSHR; 

PROC  SORT  DATA  =  DATA54 ID  OUT=DATA541F ; 

BY  NSHR  ITOTP ; 

DATA  DATAOUT. DATA541F; 

SET  DATA54 IF ' 

KEEP  PG  NSHR’HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  RSREG  DATA=  DATA54 IF ; 

MODEL  IMCOST= ITOTP/ LACKFIT ; 

BY  NSHR; 

DATA  DATA541G; 

AFT  nATAS41 ■ 

IF  PG  GE  1  And  PG  LE  9  THEN  PG=1; 

IF  FG  GE  10  AND  PG  1.E  19  THEN  PG  =  2; 

IF  PG  GE  20  AND  PG  LE  23  THEN  PG=3; 

DATA  DATAOUT. DATA541G; 

SET  DATA541G; 

KEEP  PG  NSHR  HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  SORT  DATA  =  DATA541G  OUT=DATA54 1H ; 

BY  PG; 

DATA  DATAOUT. DATA541H; 

SET  DATA541H; 

KEEP  PG  NSHR  HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  PLOT  DATA  =  PATA54 1H ; 

PLOT  MCOST-TOTP; 

BY  PG- 

PROC  PLOT  £>ATA  =  DATA54 1H ; 

PLOT  IMCOST'- ITOTP ; 

BY  FG: 

FROC  UNIVARIATE  DATA=DATA54 1H  PLOT  NORMAL; 

VAR  MCOST; 

BY  PG: 

PROC  UNIVARIATE  DATA-DATA541H  PLOT  NORMAL; 

VAR  IMCOST; 

BY  PG; 

PROC  REG  DATA=DATA541H  SIMPLE; 

MODEL  MCOST=TOTP ; 

OUTPUT  OUT=DATA546H 
P=MCSTHT 
R=RESID; 

BY  PG- 

PROC  REG  £)ATA  =  DATA541H  SIMPLE; 

MODEL  IMCOST= ITOTP; 

OUTPUT  OUT • DATA547H 
P-IMCSTHT 
R=RESID; 

BY  PG;  , 

PROC  PLOT  DATA=PATA546H; 

FLOT  RESID  TOTP/ VREF  =  0 ; 

BY  PG; 

PROC  PLOT  DATA=DATA546H; 

PLOT  RESID'-  MCSTHT/ VREF  =  0 ; 

BY  PG; 

PROC  PLOT  DATA=PATA547H; 

PLOT  RESID -ITOTP/ VREF=0; 

BY  TG; 

PROC  PLOT  DATA = PAT A547H ; 

PLOT  RESID  'IMCSTHT/ VREF  =  0 ; 

BY  PG; 

PROC  UNIVARIATE  PATA=PATA546H  PLOT  NORMAL; 

VAR  RESID; 

BY  PG; 


PROC  UNIVARIATE  DATA=DATA547H  FLOT  NORMAL; 

VAR  RESID; 

BY  PG; 

PROC  SORT  DATA  =  DATA541H  OUT=DATA541I ; 

BY  PG  TOTP: 

DATA  DATAOUT. DATA54 II ; 

SET  DATA341I; 

KEEP  PG  NSHR  HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB 
PROC  RSREG  DATA=DATA54 II ; 

MODEL  MCOST-TOTP/LACKFIT ; 

BY  PG; 

DATA  DATA$4 1J ; 

SET  DATA541H; 

PROC  SORT  DATA  =  DATA541H; 

BY  PG  ITOTP: 

DATA  DATAOUT. DATA j4lj ; 

SET  DATA54U; 

KEEP  PG  NSHR  HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB 
PROC  RSREG  DATA=DATA54 1 J ; 

MODEL  IMCOST=ITOTP/LACKFIT ; 

BY  PG: 

DATA  DATA541K; 

SET  DATA541; 

IF  NSHR  GT  1  THEN  NSHR=1: 

IF  PG  GE  1  AND  PG  LE  9  THEN  PG=1; 

IF  PG  GE  10  AND  PG  LE  19  THEN  FG=2; 

IF  PG  GE  20  AND  PG  LE  23  THEN  PG=3 
DATA  DATAOUT.  DATA541K; 

SET  DATA54 IK • 

KEEP  PG  NSHR'HT  BP.  MCOST  IMCOST  ITOTP  TOTP; 

PROC  SORT  DATA  =  DATA541K  OUT=DATA54 1L ; 

BY  NSHR  PG: 

DATA  DATAOUT.  DATA541L; 

SET  DATAS41L; 

KEEP  PG  NSHR  HT  BP.  MCOST  IMCOST  ITOTP  TOTP; 

PROC  PLOT  DATA=PATA541L; 

PLOT  MCOST "TOTP ; 

BY  NSHR  FG; 

PROC  PLOT  DATA=6ATA541L; 

PLOT  IMCOST1- 1  TOTP; 

BY  NSHR  FG; 

PROC  UNIVARIATE  DATA-DATA54 1L  PLOT  NORMAL; 

VAR  MCOST: 

BY  NSHR  Pi: 

FROC  UNIVARIATE  DATA=DATA54 1L  PLOT  NORMAL; 

VAR  1 M r  r I ; 

BY  NSl”  PG; 

PROC  REG  DATA--D4tA541L  SIMPLE; 

MODEL  MCOST=TOTP; 

OUTPUT  Ol'T  =  DATA5  46L 
P=MCS IHT 
R=RESID ; 

BY  NSHR  PG: 

PROC  REG  DATA=DA1 A54 1L  SIMPLE; 

MODEL  IMCOST= ITOTP: 

OUTPUT  OUT=DATA547L 
P= IMCSTHT 
R=RESID ; 

BY  NSHR  PG; 

PROC  PLOT  DATA=DATA54GL: 

PLOT  RESID'-TOTP/ VREF  - 0  ; 

BY  NSHR  PG; 

PROC  PLOT  DATA=PATA546L; 

PLOT  RESID-MCSTHT / VREF=0 ; 

BY  NSHR  PG; 

PROC  PLOT  DATA=pATA547L; 

PLOT  RESID"ITOTP/VREF=0 ; 

BY  NSHR  PG; 

PROC  PLOT  DATA=PATA547L: 

PLOT  RESID’IMCSTUT/ VREF=0 ; 

BY  NSHR  PG: 

PROC  UNIVARIATE  DATA=DATA546L  PLOT  NORMAL; 


VAR  RESID; 

PROCBUNIVARIAlfe  DATA=DATA547L  PLOT  NORMAL; 

VAR  RESID; 

PROCBSORTHDATA ’ =  DATA541L  OUT=DATA541M; 

BY  NSHR  PG  TOTP ; 

DATA  DATAOUT. DATA541M; 

KEEPDPGNSHR;HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  RSREG  DATA=  DATA54 1M ; 

MODEL  MCOST=TOTP/LACKFIT; 

BY  NSHR  PG; 

DATA  DATA54 IN  j 

SET  DATA541L; 

PROC  SORT  DATA  =  DATA541L; 

BY  NSHR  PG  ITOTP; 

DATA  DATAOUT. DATA541N ; 

KEEPDPGANSHR;HT  BR  MCOST  IMCOST  ITOTP  TOTP  NUMB; 
PROC  RSREG  DATA  =  DATA541N  ; 

MODEL  IMCOST= ITOTP / LACKFIT; 

BY  NSHR  PG; 

OPTIONS  LINESIZE  =  80 ; 
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